On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
نویسندگان
چکیده
We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have been shown to outperform SGD, both theoretically and empirically. However, asynchronous versions of these algorithms—a crucial requirement for modern large-scale applications—have not been studied. We bridge this gap by presenting a unifying framework for many variance reduction techniques. Subsequently, we propose an asynchronous algorithm grounded in our framework, and prove its fast convergence. An important consequence of our general approach is that it yields asynchronous versions of variance reduction algorithms such as SVRG and SAGA as a byproduct. Our method achieves near linear speedup in sparse settings common to machine learning. We demonstrate the empirical performance of our method through a concrete realization of asynchronous SVRG.
منابع مشابه
Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization
We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on nonconvex optimization. Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. However, there is no work to analyze asy...
متن کاملAsynchronous Distributed Semi-Stochastic Gradient Optimization
With the recent proliferation of large-scale learning problems, there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However, existing algorithms either suffer from slow convergence due to the inherent variance of stochastic gradients, or have a fast linear convergence rate but at t...
متن کاملIS-ASGD: Importance Sampling Accelerated Asynchronous SGD on Multi-Core Systems
Variance reduction (VR) algorithms for convergence acceleration of stochastic gradient descent (SGD) have been developed with great efforts recently. Its two variants, stochastic variance-reduced-gradient (SVRG) and importance sampling (IS) have achieved impressive progresses. Meanwhile, asynchronous SGD (ASGD) is becoming more important due to the ever-increasing scale of optimization problems...
متن کاملAsynchronous Accelerated Stochastic Gradient Descent
Stochastic gradient descent (SGD) is a widely used optimization algorithm in machine learning. In order to accelerate the convergence of SGD, a few advanced techniques have been developed in recent years, including variance reduction, stochastic coordinate sampling, and Nesterov’s acceleration method. Furthermore, in order to improve the training speed and/or leverage larger-scale training data...
متن کاملVariance Reduction for Distributed Stochastic Gradient Descent
Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger, constant stepsizes and preserving linear convergence rates. However, current variance reduced SGD methods require either high memory usage or an exact gradient computation (using the entire dataset) at the end of each epoch. This limits the use of VR methods in practical dis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015